Notes:
Notes:
library(ggplot2)
pf <- read.csv('pseudo_facebook.tsv', sep = '\t')
ggplot(aes(x = age, y = friend_count), data = pf) +
geom_point(alpha = 1/20) +
xlim(13, 90)
## Warning: Removed 4906 rows containing missing values (geom_point).
ggplot(aes(age, friendships_initiated ), data = pf) +
geom_point(alpha = 1/10) +
xlim(13, 90) +
coord_trans(y = 'sqrt')
## Warning: Removed 4906 rows containing missing values (geom_point).
Response:
Notes:
Notes:
Response:
Notes:
Notes:
Notes:
Notes:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
pf.fc_by_age <- pf %>%
group_by(age)%>%
summarise( friend_count_mean = mean(friend_count),
friend_count_median = median(friend_count),
n = n())%>%
arrange(age)
Create your plot!
Notes:
ggplot(aes(x = age, y = friend_count), data = pf) +
xlim(13, 90) +
geom_point(alpha = 1/20, position = position_jitter(h=0), color = 'orange') +
coord_trans(y = 'sqrt') +
geom_line(stat = 'summary', fun.y = mean) +
geom_line(stat = 'summary', fun.y = quantile, color = 'blue', fun.args = list(probs = 0.9), linetype = 2) +
geom_line(stat = 'summary', fun.y = quantile, color = 'blue', fun.args = list(probs = 0.5)) +
geom_line(stat = 'summary', fun.y = quantile, color = 'blue', fun.args = list(probs = 0.1), linetype = 2)
## Warning: Removed 4906 rows containing non-finite values (stat_summary).
## Warning: Removed 4906 rows containing non-finite values (stat_summary).
## Warning: Removed 4906 rows containing non-finite values (stat_summary).
## Warning: Removed 4906 rows containing non-finite values (stat_summary).
## Warning: Removed 5199 rows containing missing values (geom_point).
Response: ?coord_cartesian() ***
See the Instructor Notes of this video to download Moira’s paper on perceived audience size and to see the final plot.
Notes:
Notes:
?cor.test
Look up the documentation for the cor.test function.
What’s the correlation between age and friend count? Round to three decimal places. Response:
Notes:
Notes:
Notes:
colnames(pf)
## [1] "userid" "age"
## [3] "dob_day" "dob_year"
## [5] "dob_month" "gender"
## [7] "tenure" "friend_count"
## [9] "friendships_initiated" "likes"
## [11] "likes_received" "mobile_likes"
## [13] "mobile_likes_received" "www_likes"
## [15] "www_likes_received"
ggplot(aes(x = www_likes_received, y = likes_received),data = pf) +
geom_point() +
coord_cartesian(xlim = c(0, 10000), ylim = c(0, 40000))
cor.test(pf$www_likes_receive, pf$likes_received)
##
## Pearson's product-moment correlation
##
## data: pf$www_likes_receive and pf$likes_received
## t = 937.1, df = 99001, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9473553 0.9486176
## sample estimates:
## cor
## 0.9479902
Notes:
What’s the correlation betwen the two variables? Include the top 5% of values for the variable in the calculation and round to 3 decimal places.
Response:
Notes:
Notes:
Create your plot!
library(alr3)
## Loading required package: car
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
data(Mitchell)
ggplot(aes(x = Month, y = Temp), data = Mitchell) +
geom_point()
Take a guess for the correlation coefficient for the scatterplot.
What is the actual correlation of the two variables? (Round to the thousandths place)
cor.test(Mitchell$Month, Mitchell$Temp)
##
## Pearson's product-moment correlation
##
## data: Mitchell$Month and Mitchell$Temp
## t = 0.81816, df = 202, p-value = 0.4142
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.08053637 0.19331562
## sample estimates:
## cor
## 0.05747063
?Mitchell ### Making Sense of Data Notes:
ggplot(aes(x = Month, y = Temp), data = Mitchell) +
geom_point() +
scale_x_continuous(breaks = seq(0, 204, 12))
What do you notice? Response:
Watch the solution video and check out the Instructor Notes! Notes:
Notes:
pf$age_with_months <- pf$age + (1 - pf$dob_month / 12)
p1 <- ggplot(aes(x = age, y = friend_count_mean), data = pf.fc_by_age) +
geom_line()
p1
library(dplyr)
pf.fc_by_age_months <- pf%>%
group_by(age_with_months)%>%
summarise(friend_count_mean = mean(friend_count),
friend_count_median = median(friend_count),
n = n())%>%
arrange(age_with_months)
head(pf.fc_by_age_months)
## # A tibble: 6 x 4
## age_with_months friend_count_mean friend_count_median n
## <dbl> <dbl> <dbl> <int>
## 1 13.2 46.3 30.5 6
## 2 13.2 115. 23.5 14
## 3 13.3 136. 44 25
## 4 13.4 164. 72 33
## 5 13.5 131. 66 45
## 6 13.6 157. 64 54
p2 <- ggplot(aes(x = age_with_months, y = friend_count_mean), data = subset(pf.fc_by_age_months,age_with_months < 71)) +
geom_line()
p2
####Now, plot the previous two plots together side by side:
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
grid.arrange(p1, p2, ncol = 1)
Programming Assignment
data("diamonds")
colnames(diamonds)
## [1] "carat" "cut" "color" "clarity" "depth" "table" "price"
## [8] "x" "y" "z"
ggplot(aes(x = x, y = price), data = diamonds) +
geom_point()
Notes:
Notes:
Reflection:
Click KnitHTML to see all of your hard work and to have an html page of this lesson, your answers, and your notes!